首页> 外文OA文献 >Bilingual Distributed Word Representations from Document-Aligned Comparable Data
【2h】

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

机译:文档对齐的双语分布式Word表示   可比数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We propose a new model for learning bilingual word representations fromnon-parallel document-aligned data. Following the recent advances in wordrepresentation learning, our model learns dense real-valued word vectors, thatis, bilingual word embeddings (BWEs). Unlike prior work on inducing BWEs whichheavily relied on parallel sentence-aligned corpora and/or readily availabletranslation resources such as dictionaries, the article reveals that BWEs maybe learned solely on the basis of document-aligned comparable data without anyadditional lexical resources nor syntactic information. We present a comparisonof our approach with previous state-of-the-art models for learning bilingualword representations from comparable data that rely on the framework ofmultilingual probabilistic topic modeling (MuPTM), as well as withdistributional local context-counting models. We demonstrate the utility of theinduced BWEs in two semantic tasks: (1) bilingual lexicon extraction, (2)suggesting word translations in context for polysemous words. Our simple yeteffective BWE-based models significantly outperform the MuPTM-based andcontext-counting representation models from comparable data as well as priorBWE-based models, and acquire the best reported results on both tasks for allthree tested language pairs.
机译:我们提出了一种新的模型,用于从非并行文档对齐数据中学习双语单词表示。继单词表示学习的最新进展之后,我们的模型学习了密集的实值单词向量,即双语单词嵌入(BWE)。与以前的诱导BWE的工作不同,BWE的工作主要依赖于平行的句子对齐的语料库和/或易于获得的翻译资源(例如词典),本文揭示了BWE可能仅基于文档对齐的可比数据进行学习,而没有任何其他的词汇资源或语法信息。我们将我们的方法与以前的先进模型进行比较,该模型用于从可比较数据中学习双语单词表示,该可比较数据依赖于多语言概率主题建模(MuPTM)框架以及分布式本地上下文计数模型。我们证明了诱导的BWE在两个语义任务中的效用:(1)双语词典提取;(2)建议在上下文中针对多义词进行词翻译。我们简单而有效的基于BWE的模型在可比数据和基于先前BWE的模型上均大大优于基于MuPTM和上下文计数的表示模型,并且在所有三种测试语言对的两项任务中均获得了最佳的报告结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号